Classifying Unknown Proper Noun Phrases Without Context
نویسندگان
چکیده
We present a probabilistic generative model used to classify unknown Proper Noun Phrases into semantic categories. The core of the classifier is an n-gram character model, which is enhanced with an n-gram word-length model and a common word model. While most work has depended largely on context or domain-specific rules for semantic disambiguation of unknown names, we demonstrate that there is surprisingly reliable statistical information available in the composition of the names themselves. Using the context-independent probabilities assigned by our domain independent classifier is sufficient to achieve greater than 90% classification accuracy on typical tasks.
منابع مشابه
“ the ” , “ a ” , and “ ”
The definite determiner “the” conveys a relation between the entity referred to by the noun phrase and the description provided by the noun phrase, and says that the entity is uniquely mutually identifiable in context by the speaker and hearer by virtue of that descripiton. This characterization splits into six different cases. The indefinite determiner “a” and the bare plural (the empty string...
متن کاملTagging Unknown Proper Names Using Decision Trees
This paper describes a supervised learning method to automatically select from a set of noun phrases, embedding proper names of different semantic classes, their most distinctive features. The result of the learning process is a decision tree which classifies an unknown proper name on the basis of its context of occurrence. This classifier is used to estimate the probability distribution of an ...
متن کاملTwo Types of Definites: Evidence for Presupposition Cost
This paper investigates the notion of definiteness from a psycholinguistic perspective and addresses Löbner’s (1987) distinction between semantic and pragmatic definites. To this end inherently definite noun phrases, proper names, and indexicals are investigated as instances of (relatively) rigid designators (i.e. semantic definites) and contrasted with definite noun phrases and third person pr...
متن کاملIdentifying Multiple Topics in Texts
In this paper, we present an innovative method for multi-label text classification. Our method uses Lucene to index texts and then assigns one or more classes to a new text based on its similarity relative to an annotated corpus. For finer granularity, we split the text into phrases, and then we focus on the noun phrases. Instead of classifying the entire text, we classify each noun phrase. The...
متن کاملReplicating Quantified Noun Phrases in Database Semantics
Predicate calculus treats determiner-noun sequences like the man, every man, or several men as ‘quantified noun phrases.’ This analysis in terms of quantifiers, variables, and connectives creates a major structural difference compared to the handling of proper names. The modeling of natural language communication in database semantic (DBS), in contrast, treats the functorargument structure as p...
متن کامل